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Abstract 

Localization in indoor environments is a technique which estimates the robot’s pose by fusing data from onboard motion 
sensors with readings of the environment, in our case obtained by scan matching point clouds captured by a low-cost Kinect 
depth camera. We develop both an Invariant Extended Kalman Filter (IEKF)-based and a Multiplicative Extended Kalman Filter 
(MEKF)-based solution to this problem. The two designs are successfully validated in experiments and demonstrate the advantage 
of the IEKF design. 
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I. Introduction and Literature Review 

L OCALIZATION is a fundamental building block for mobile robotics (TJ- F° r operations in environments where GPS is 
not available e.g. indoor navigation, or for situations where GPS is available but degraded, the uncertainty in the vehicle 
pose can be reduced by using information from a 3D map 0. Fundamentally this represents a sensor fusion problem, and 
as such it is typically handled via an Extended Kalman Filter (EKF) c.f. the textbooks a, m, although a number of direct 
nonlinear observer designs have been proposed e.g. 0, 0, 0- The EKF approach offers two practical advantages over these: 
it is a fully systematic design procedure, and it admits a probabilistic interpretation. Indeed it can be viewed as a maximum 
likelihood estimator that combines sensor information in an optimal way (in the linear case), with the confidence in each 
measurement being described by a covariance matrix. 

In this paper, we address a localization problem, where pose estimates from proprioceptive sensors are obtained by matching 
images obtained from a depth scanner (scan matching or LIDAR-based odometry 0) with a 3D map. We use the Iterative 
Closest-Point (ICP) algorithm 0, (Toll on 3D point clouds obtained from a low-cost Kinect camera mounted on our experimental 
wheeled robot. During the motion we assume an already built 3D gross map made of clouds of points is available. When a 
scan contains substantially more information than the map, it can be aggregated to the existing map. Further details will be 
provided in Section Hill 

The sensor fusion EKF works by linearizing a system about its estimated trajectory, then using estimates obtained from 
an observer for this linearized model to correct the state of the original system. In this way the EKF relies on a closed 
loop which can be destabilized by sufficiently poor estimates of the trajectory, known as divergence 0. Clearly, reducing 
or eliminating the dependence of the EKF on the system’s trajectory would increase the robustness of the overall system. 
An emerging methodology to accomplish this goal is the Invariant EKF ifTTI . II 1 21 . built on the theoretical foundations of 
invariant (symmetry-preserving) observers ca, in. The IEKF technique has already demonstrated experimental performance 
improvements over a typical 0 “Multiplicative” EKF (MEKF) in aided inertial navigation designs fl5l . Ifl6ll . From a practical 
viewpoint, the advantage of EKF-like techniques is their automatic tuning of gains based on the estimated covariance of the 
measurements. Applying the IEKF to an aided scan matching problem was first demonstrated in El for 3D mapping. A 
preliminary version of this work appears in conference proceedings ED- The contributions of the present paper are as follows: 

1) Designing a scan matching-aided localization system for a wheeled indoor robot which fuses robot odometry with 
Kinect-based scan matching, 

2) A method to compute a realistic covariance matrix associated to the scan matching step and implicitly detect undercon¬ 
strained environments, 

3) Providing both an IEKF (whose non-linear structure takes advantage of the the geometry of the problem) and a MEKF 
version of the state estimator for this problem, 

4) Experimentally validating both designs, and comparing their accuracy w.r.t. ground truth data provided by an optical 
motion capture system. 

M. Barczyk is with the Department of Mechanical Engineering, University of Alberta, Edmonton, AB, T6G 2G8 Canada 
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This paper is structured as follows. A brief literature review and problem motivation have been provided above. Section HI] 
provides the model of the system dynamics and derives linearizing approximations used in the sequel. Section Hill covers the 
process of obtaining aiding measurements from scan matching of point clouds provided by a Kinect depth camera mounted 
on the robot, and provides a meaningful and simple method to compute a covariance matrix associated to the measurements. 
Section [TV] provides the equations of the invariant observer, IEKF and MEKF state estimators, which are then experimentally 
validated in Section [V] Conclusions and future work directions are given in Section [Vi] 


II. System Models 


A. Noiseless System 

In the absence of sensor noise, the dynamics of a rigid-body vehicle are governed by 


R = RS{co) 
p = Rp 


(1) 


where R £ SO (A) is a rotation matrix measuring the 3D attitude of the vehicle, oj £ R 3 is the body-frame angular velocity 
vector, S(-) is the 3x3 skew-symmetric matrix such that S(x)y = x x y where x denotes the R 3 cross-product, pel 3 
is the position vector of the vehicle expressed in coordinates of the ground-fixed frame, and pi £ R 3 is the velocity vector 
of the vehicle expressed in body-frame coordinates. The body-frame angular velocity ui and linear velocity pi vectors can be 
measured directly using on-board motion sensors, e.g. a triaxial rate gyro for w and a Doppler radar for pi. In the case of a 
wheeled vehicle traveling over flat terrain, odometry information from the left and right wheels can be used to compute the 
forward component of vector pt and the vertical component of vector lu, taking the remaining components as zero. 

As will be discussed in Section [III] the vehicle’s attitude R and position p are computed by scan matching images from 
the on-board Kinect depth camera with a map of 3D points via the Iterative Closest Point (ICP) algorithm. This gives output 
equations 



B. Sensor noise models 

In reality the input signals oj and pt in <[T]) and the output readings iju, y p in (0 are corrupted by sensing noise. For the 
former we employ the sensor model typically used in aided navigation design j4j 

U1 = LJ + V u 
P = pi + 

where v ~ A'"(0. cr 2 ) terms represent additive Gaussian white noise vectors whose covariance can be directly identified from 
logged sensor data. Remark the noise vectors and v p are expressed in coordinates of the body-fixed frame. The noise 
models for scan matching outputs 0 are not standard and will be derived in Section UlI-DI 

C. Geometry of SO( 3) 

The special orthogonal group SO( 3) is a Lie group. For any Lie group G, there exists an exponential map exp : g —> G 
which maps elements of the Lie algebra g to the Lie group G. It is known (e.g. d p. 519]) that for any X £ q, (exp X)" 1 = 
exp(— X), that exp is a smooth map from g to G, and that exp restricts to a diffeomorphism from some neighborhood of 0 
in g to a neighborhood of the identity element e in G. The last fact means there exists in a neighborhood U of e an inverse 
smooth map log : G —> g such that exp o log(g) = g, \/g £ U. 

For the particular case of SO(3), the associated Lie algebra so(3) is the set of 3 x 3 skew-symmetric matrices £ := S(x), x £ 
R 3 and exp : so(3) —> SO( 3) is the matrix exponential 

exp £ = / + £+ H- 

while log : SO(3) —> so(3) is given by 

log R = aS(/3) where a : 2cosa + 1 = trace(i?) and S(/3) =—r—(R — R T ) 

2 sin a 

Now consider the special case of R £ SO(3) close to I. Since the exponential map restricts to a diffeomorphism from a 
neighborhood of zero in so(3) to a neighborhood of identity in SO( 3), £ £ so( 3) is close to the matrix O 3 X 3 such that £ 2 and 
higher-order terms in R = exp £ are negligible and thus 

Rpil + i, R£ SO{ 3) close to I 


(4) 
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Since (exp£) 1 = exp(—£) we also have 

FT 1 « I - Re 50(3) close to I (5) 

Still for R close to I, trac e(R) ~ 3 such that a ss 0 and so log R ~ (f? — R T )/2. We define the projection map 

7r : 50(3) —* so(3), 7r(f?) =--- 

which is defined everywhere but is the inverse of exp only for R = exp£ close to /. In this case n(R) « £ and 0 becomes 

R-Iktt(R), R e 50(3) close to / (6) 

Approximations 0, 0 and <[6l> will be employed in the sequel. 

III. Depth Camera Aiding 

A. Camera Hardware 

Our test robot is equipped with a Kinect depth camera, a low-cost (currently around 100 €) gaming peripheral sold by 
Microsoft for the XBox 360 console. The Kinect’s sensing technology is described in |20| . The unit employs an infrared laser 
to project a speckle pattern ahead of itself, whose image is read back using an infrared camera offset from the laser projector. 
By correlating the acquired image with a stored reference image corresponding to a known distance, the Kinect computes a 
disparity map (standard terminology in stereo camera vision) of the scene which is employed to construct a 3D point cloud 
of the environment in the robot-fixed frame. The disparity maps computed by the Kinect, corresponding to individual pixels 
of the IR camera image, are available as 640 x 480 images at a rate of 30 Hz. Physically the IR camera has a total angular 
field of view of 57° horizontally and 43° vertically, such that the constructed point cloud is fairly “narrow” as compared to 
time-of-flight scanning laser units such as the Hokuyo UTM-30LX (270°) or the Velodyne HDL-32E (360°). The maximum 
depth ranging limit of the Kinect is 5 m, in contrast to the UTM-30LX (30 m) and HDL-32E (70 m). The Kinect can only 
be utilized under indoor lighting conditions. But conversely, the Kinect is much less expensive than the Hokuyo (about 4500 
€) and Velodyne (about 22000 €) units and unlike the scanning units is not prone to point cloud deformation at high vehicle 
speeds. The scan matching algorithms developed for Kinect point clouds are adaptable to these units. 

A comprehensive analysis of the Kinect’s accuracy and precision is carried out in ED, showing that point cloud measurements 
are affected by both noise and resolution errors which grow significantly with distance from the camera. Consequently the 
recommended usable range interval is 1 < z < 3 m. 

B. ICP Algorithm 

The ICP algorithm g), |T0] is an iterative procedure for finding the optimum rigid-body transformation (SR, ST) e 50(3) x 
M 3 between two sets of points in R 3 (clouds) {«, } and { b ,}, which do not necessarily have equal number of entries. Following 
the taxonomy in Il22l an ICP algorithm consists of the following (iterated) sequence of steps: 

1) Select source points from one or both clouds. We select a total of 3000 points (about 1% of the available) from both clouds 
and located away from edges, compute their corresponding unit normals {rii} by fitting a plane through neighboring 
points 11231 . and employ the strategy of normal-space sampling Il22l . 

2) Match the source points with those in the other cloud(s) and reject poor matches. We employ a nearest-neighbor search 
accelerated by a k — d tree (24), then reject pairs whose point-to-point distance exceeds the threshold value of 0.25 m 
or whose associated normals form an angle larger than 45°. 

3) Minimize the error cost function. We choose the point-to-plane ICP variant da 

N 

f(SR, ST) = [(Mfai + ST- b t ) ■ m ] 2 (7) 

i— 1 

and apply linearization to solve 0 as explained below. 

4) Terminate if convergence criteria met, else goto 1). For simplicity we do not employ an early stop condition and always 
run 25 iterations of the ICP algorithm. 

The heart of the ICP algorithm lies in the way the matching step is done, as the points of the two clouds are arbitrarily labeled 
and do not initially match correctly. Linearization is justified provided SR is close to /, equivalent to clouds {a,} and { b ,} 
starting off close to each other. As explained in Section UlI-DI the closeness assumption is valid due to pre-aligning of the two 
clouds using an estimate of pose obtained at the prediction step of the overall filter. We thus employ linearization 0 in 0 
to obtain 

f(x) = ^ [(XR X Qi + X T + CLi - bi) ■ m] [(«i X n i ) ' X R + n i ■ X T + Tli ■ (ttj - 6j)] 


( 8 ) 
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where we have used the scalar triple product circular property (a x b) • c = (b x c) • a. In the linearized context, minimizing 
the ICP cost function 0) is equivalent to minimizing ([8]> / : R 6 R by choice of x = [xr xt] t • We define the terms 

Hi := [{at x rii) T nf] y t := nf (a* - bf) 


to rewrite © as 

/(*) = E [HiX + Vi ] 2 

i 


(9) 


The minimum of © f(x) is achieved at x for which df /dx = 0 since d 2 f/dx 2 = 2 A > 0 everywhere with 


A ■= 

b := 


Y,m T Hi = Y, 

i i 

E^)^ = E 

i i 


(ai x rii){ai x n,) T 
rij(a, x rii) T 

(ai x nf)nf (a* - 6,) 
riinj(ai - bi ) 


(a* x rii)nj 

T 

riirii 


This minimum is found by solving the linear system of equations Ax = — b. Then up to second order terms, (R, p) = 
(I + S(xr),x p ) is the rigid-body transformation which minimizes the point-to-plane error ©. 


C. Covariance of ICP output 

As stated in Section Illl-AI the measured point clouds { a ,} and {/;,} are affected by sensor noise and resolution errors. 
We require a method to estimate the covariance (uncertainty) of the rigid-body transformation obtained by running the ICP 
algorithm between measured point clouds. 

We continue to assume {ai} and {/;,} start close to each other (the open-loop integration of the motion sensors allows to 
pre-align the clouds). We can thus model the (point-to-plane) ICP as a linear least-squares estimator minimizing © resp. ©. 
Let xicp denote the estimate computed by the ICP from noisy point cloud data {a,} and { b 7 } and x* the true transformation. 
The associated covariance is 

co v(x IC p) = E {[x IC p - X*) ( Xicp - X*) T ^ (10) 

Based on the linear least-squares cost function © we define the residuals as the error purely due to sensor noise, that is, the 
error between each measured point and the original point transformed through the true transformation: 


n:=HiX*+yi (11) 

The estimate xicp minimizing © was already shown to be xicp = —A~ l b = — [Yli(Hi) T Hi] 1 12i(Hi) T yi. Rewrite this 
using Cjj} as x IC p = [J2i( H i) TH i] 1 ( H i x * - r i) = x* - A- 1 J2i(Hi) T ri and (Q© becomes 

. T \ 

co v(x IC p) =e((- A -1 ^ ~2(Hi) T n ) ( - A -1 YXHjfn 


E E ((HifEMHj) [ 'Y J (H-i) TH i 


i 3 


( 12 ) 


In order to evaluate (THU) we need to assume a noise model on the residuals r, in i | I I | i. Expanding the right-hand side using 
the definitions of //, and y, we have 

n = (ai x m) T x* R + nf x* T + nf (a t - bi) = [(xj, x af) + x* T + (a, - bf)] ■ Hi := 


where Wi € R :! represents the post-alignment error of the j th point pair due (only) to the presence of sensor noise in point 
clouds {ai} and {bi}. The residual r t = w-i ■ rii = nf Wi now represents the projection of Wi and (IT2l) becomes 


cov(a : IC p) 


E(^) T ^ 


EE {(HifnfE(w i wf)n j H j 


) f Ew > 5 



(13) 


The classical Hessian method |f25l , (T7| , |fl8l consists of assuming the post-alignment errors w, are independent and identically 
normally isotropically distributed with standard deviation a. Under these assumptions E(wiwJ) = E(wi)E(wJ) = 0, i X 3 
and the double sum in dl~!Tb reduces to a single sum (nf n t = 1 for unit normals): 


co v(x IC p) = 


i -i 


E(^) T ^: <J 2 Y( H ^ Th * J2^ TR ‘ 


1 -1 




«-i 


This is precisely the covariance of a linear unbiased estimator using observations with additive white Gaussian noise c.f. l26l 
p. 85]. However, there is a catch: for a Kinect sensor with N « 300 000 points per cloud and a « 1 cm for depth range 
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1 < z < 3 m ETl Fig. 10], the above expression yields a covariance matrix with entries on the order of nanometers, a 
completely overoptimistic result for the low-cost Kinect sensor. This result stems from the independence assumption which 
makes the estimator converge as 1/VN where N is the (high) number of points. 

We thus need to consider a more complete error model for the residuals. In addition to random noise on the residuals whose 
contribution to cov(a '-icp) rapidly becomes negligible as the number of point pairs N increases, the Kinect exhibits resolution 
errors on the order of 6 ~ 1 cm for 1 < z < 3 m ETl Fig. 10] due to quantization errors incurred during IR image correlation 
to a reference image as discussed in Section IIII-AI These resolution errors are non-zero mean and are not independent of each 
other and actually constitute the dominant limitation in accuracy. As shown in our preliminary work f27 j, this leads to the 
following approximation of the covariance matrix (IT3l) : 


cov(a : IC p) « 5' 


N 


ym 



(cii x x Hi ) 1 

ru(ai x m) T 


(cii x rii)nf 


-l 


T 

riin\ 


(14) 


where N p = 3 is the number of plane “buckets” used for normal-space sampling l22ll in Step 1 of the ICP algorithm, 
c.f. Section UlI-BI Note that the covariance matrix correctly reflects the observability of the environment: for instance if the 
environment consists of a single plane, all rii s are identical leading to rank deficiency of the matrix to be inverted, thus the 
covariance is infinite along directions parallel to the plane. Also note that the overall ICP variance is on the order of the 
Kinect’s precision 8 2 in the case of full observability, as expected. 


D. Scan Matching as Measured Output 

Suppose a 3D map of the environment made up of point clouds is already available, built either by the robot or from 
lasergrammetry. Assuming the scan from the on-board depth camera and the map contain a set of identical features, the robot’s 
pose with respect to the map can be identified through an ICP algorithm. However, due to sensor noise and quantization effects, 
the computed pose is noisy. 

Denote the initial estimate of the robot’s pose (obtained from numerical integration of the vehicle dynamics ([!])) as (R~ ,p~). 
This estimate is used to pre-align the two scans (the current scan and the map) in the body-fixed frame; this is required since 
the ICP does not guarantee global convergence, and in fact can easily get stuck at a local minimum. The robot’s true pose can 
then be expressed up to second order terms as ( R~(I + S(x* r )),R~x* + p~ ), that is the vector (x R ,x*) £ R 6 is defined as 
the discrepancy of the initial estimation (R~ ,p~) and the true pose ( R*,p*) projected in the Lie algebra R 6 and is the vector 
to be estimated by the ICP algorithm. 

Based on Section flll-Cl we have seen that the ICP output writes xicp = (x Rl x p ) + {v R . u p ) where the covariance of the 
vector (z/fj, v p ) £ R 6 is given by (fl3] > and can be approximated by the simple to compute expression (IT-fl i. Using the linearity 
of the map S, we see that / + S((xicp)r + Pr) = I + S(x* R ) + S(v R ) and so the pose output from scan matching is (up to 
second-order terms) 


The above is a noisy version of (0. where the noise covariance cov(v), v := \v R v p ] T is (approximately but efficiently) 
computed by (fT-fl ). 


+ 


R*S{v r )\ 
R*v p J 


(15) 


IV. State Estimator Design 

A. Invariant Observer 

We first design an invariant observer for the nominal (noise-free) system (|TJ. (O by following the constructive method 
in E), El; a tutorial presentation is available in El and so the calculations will be omitted. We consider the case where 
the Special Euclidean Lie group SE{ 3) = SO(3) x R 3 acts on the SE( 3) state of ([TJ by left translation, which physically 
represents applying a constant rigid-body transformation (f? 0 ;Po) to ground-fixed frame vector coordinates. This leads to the 
nonlinear invariant observer 

where L R , L^, L P R , L p are R 3x3 gain matrices and 

Er := -S~ 1 (n(R T y R )), E p = R T (p - y p ) 

are the invariant output error column vectors. Standard computations in the framework of symmetry-preserving observers show 
the associated invariant estimation errors rj R = R T R, rj p = R T (p — p) have dynamics 

V R = - S(ut)rj R + tjrS{u}) + rj R S (L r E r + L*E p ) 

Vp = ~S(u;)r] p + rj R p - p + vr{L r E r + L p E p ) 


_ f RS(uj) 
V Rd 


R 


S{L r E r + L*E p ) 
L p r E r + L p E p 


(16) 


( 17 ) 
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and stabilizing the (nonlinear) dynamics ( fTTl to rj = I by choice of gains L leads to an asymptotically stable nonlinear 
observer (fl6l i. The stabilization process is simplified by (fl7l > not being dependent on the estimated system state x; indeed the 
fundamental feature of the invariant observer is that it guarantees rj = Y(r],I(x,u)) 1731 Thm. 2] where I(x,u) = [cu p] is 
the set of invariants in the present example. We will compute the stabilizing gains using the Invariant EKF method in order to 
handle variable scan matching observability of the environment. 


B. Invariant EKF 


The Invariant EKF HQ is a systematic approach to computing the gains K of an invariant observer by linearizing its invariant 
estimation error dynamics. A noisy version of ([]} is readily obtained by accounting for the sensors’ noise by expressing the 
measured angular and linear velocities as the noisy vectors u> = ui + i/ u and fi = jl + and letting Q v := cov[zx w v^] T 
denote the process white noise covariance matrix. The entries of Q v can be identified directly from logged sensor data, while 
R v := co v[rr v p ] T is computed by (IT3l) (or more simply (1141 )). 

Following the IEKF method, we first write a noisy version of the error dynamics (fTTl ) where we let w = u> — v bJ and 
p = ft — Vp, where ui, ft denote the noisy inputs read from onboard sensors, and the output is given by 03. This equation is 
linearized using the standard methodology of symmetry-preserving observers, i.e. using vectors of R :i to denote the orientation 
error by letting rjR := I + S(C,r), (r £ R 3 by <(4]) and letting r] p := f p £ R 3 . Up to second order terms in £, v, the noisy 
version of (fTTb is approximated by the following linear equation: 


d 

dt 


-5(w) 0 

-Sift) -S(oj) 


+ 


L P r 


L ?' 
LP P 


L R n 

L P n 


L l 


The rationale of the IEKF is to tune the gains through Kalman theory in order to minimize at each step the increase in the 
covariance of the linearized error (. This is done through the standard Kalman filter equations, letting 


' 1 1 
Co tjo 

■£* 'fL 

1 1 

>3 

° 

to 

II 

-/ 0 ' 
0 -J 

, c = 

-/ 0 ' 
0 -I 

, D = 

0 ' 

0 -/ 

II 

\L R n 

k 

lp p\ 


and tuning the gains through the standard Riccati equations in continuous time of the Kalman-Bucy filter 

P = AP + PA t - PRf 1 P + Q v 
K = —PRf 1 


( 18 ) 


(19) 


where the gains L making up K are employed in the invariant observer (1 1 6b . 

Remark the IEKF filter matrices (A,B,C,D) are dependent on the system trajectory only through the oj rn and //„, terms 
in A, and do not depend on the estimates (R,p) as in the usual Extended Kalman Filter. The interest of the Invariant EKF 
is indeed the reduced dependence of the linearized system on the estimated trajectory of the target system. In our present 
example the (A,B,C,D) matrices are guaranteed not to depend on the estimated state, which increases the filter’s robustness 
to poor state estimates and precludes divergence (c.f. Section Q}. 


C. Multiplicative EKF design 

For comparison purposes consider a typical HI Multiplicative Extended Kalman Filter (MEKF) design for our system. The 
governing system equations are given by (jT]) with noise models ([3]) and output model (1 1 51 ): 


R = RS{uj — Vu) 

P = ~ Vff) 


VR 


R + RS{vr) 

y P . 


p + Rv p 


( 20 ) 


We linearize (l20l) about a nominal system trajectory (R. ft). Remark (R R) (f 50(3) is not a valid linearized system state. 
Instead define the multiplicative attitude error T := R r R £ 50(3) such that for R close to R, T is close to I. By © 
T := I + S(S"f), S') £ R 3 and so R T R = I + S(5j) => R, — R = RS[5 r )). We define Sp = p — p, the output errors 
Si/r := 5 _1 [Tr(R T yR)] with n : 50(3) —> so( 3) from Section Hl-CI and Sy p := y p — p and obtain the linearized system 


~S{Q) 

—RS(p) 


-I 0 ' 
0 -R 


1 _ 1 

= 

7 

0 

o' 

I 

1 _ 1 

+ 

7 

0 

O' 

R 

Rr 

Rp 


( 21 ) 


an LTV system tractable using the classical Kalman Filter. The resulting [Sp S')] estimate is used to update the estimated 
state of the nonlinear system (l20l) as follows. Note S(S'y) £ so( 3) corresponds to T = R 7 R £ 50(3) and by Section 1 11 -Cl 
exp : so(3) —> 50(3) is the matrix exponential. Thus the estimated states are updated as 


p + =p + Sp and R + = R exp S(S'y) 
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The main difference between the MEKF and the IEKF is that the former linearizes the system dynamics ( l20t about a nominal 
trajectory, while the latter linearizes the invariant estimation error dynamics ( fT71 ) about identity. As discussed in Section IIV-AI 
and IIV-BI the latter dynamics do not depend on the estimated state x such that the linearized IEKF system is guaranteed 
to be robust to poor estimates of state. Meanwhile the MEKF cannot make this guarantee and indeed the linearized system 
matrices (1211 depend on the estimated state via R. The difference in experimental estimation performance of the two designs 
will be demonstrated in Section [V] 


A. Hardware platform 


V. Experimental Validation 



Fig. 1. The Wifibot Lab v4 Robot (left); Experiment area with motion-capture system (right) 


The wheeled robot used for our experiments is shown on the left of Figure |T] The robot is equipped with an Intel Core i5- 
based single-board computer running Ubuntu Linux, WLAN 802. llg wireless networking, all-wheel drive via 12 V brushless 
DC motors, and a Kinect camera providing 3-D point cloud scans of the environment. The experimental testing area is shown on 
the right side of Figure [I] and consists of an open area surrounded by a set of seven S250e cameras employed by an OptiTrack 
motion capture system to provide a set of ground truth (reference trajectory) data for the experiments with sub-millimeter 
precision at a rate of 120 Hz. The wooden parquet floor seen in Figure [T] is not perfectly flat, causing small oscillations in 
the vehicle’s attitude and height which will be visible in the experimental plots. A number of visual landmarks (rectangular 
boxes) were placed randomly around the experimental area in order to provide good scan-matching conditions. 

The robot is equipped with independent odometers on the left and right wheels. The two odometer counts are averaged 
and converted to forward velocity /i x by dividing by an experimentally-identified constant K \ = 788 m -1 representing the 
number of encoder counts per meter of travel. Assuming zero side slip as well as zero out-of-plane velocity, we obtain the 
body-fixed velocity vector p = [p x 0 0] T m/s used in noiseless vehicle dynamics ([]}. Since the present vehicle is not 

equipped with a rate gyro, we employ the difference in odometer counts, identified constant K 2 = 0.44 m representing the 
lateral distance between wheel-ground contact points and K t to compute in-plane angular velocity u z then employ the angular 
velocity vector w = [0 0 oj z ] t rad/s in (Q3. i.e. assume zero roll and pitch angular velocity due to the vehicle being level, 

with the non-flatness of the floor reflected by additive sensor noise vectors and v u . The covariances of // /( and were 
assigned by first identifying the on-axis variances cr 2 ^ = 0.01 2 m 2 /s and <t 2 ^ = 0 .02 2 rad 2 /s of data logged during respectively 
constant-velocity advance and constant-velocity circle trajectories, then taking diag(cov(r , At )) = [a 2 x 0.1<r 2 x 0.1<r 2 J and 

diag(cov(iAj)) = [O.lcr 2 ^ O.lcr 2 ^ <r 2 J on account of the uneven terrain. 

Clearly the assumptions in the previous paragraph are both optimistic and ad-hoc, for instance the zero-slip assumption 
does not fully hold, the noise covariances are assigned heuristically, and the encoder-derived data is subject to identification 
errors of n and quantization effects. This is acceptable for our purposes, however, since we are interested in comparing the 
experimental performance of two competing filter designs more than the absolute accuracy of the estimates. Since we will 
employ identical sensor data in both designs, we will be able to make a fair comparison between the two. 

For each experiment, the sampling rate of the wheel odometers was set to 50 Hz and the Kinect depth images to 1 Hz. 
The latter is a fairly slow rate for scan matching and was chosen specifically to test the robustness of each of the two filters. 
The trajectories were steered in open-loop mode by setting left and right wheel velocities. The map was built from the initial 
robot’s pose. The resulting sensor data was logged to the on-board memory and then run through both the Invariant EKF and 
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the Multiplicative EKF whose designs were covered throughout Section [TV] When an image was found to possess a substantial 
amount of novel information compared to the existing map it was aggregated to the existing map. For fairness of comparison 
the ICP algorithm and associated parameters e.g. number of point pairs and rejection criteria (c.f. Section UlI-BI) were identical 
among the filters. 

B. First Experiment 

In the first experiment, the Wifibot starts stationary near an edge of the bounding area wall, then advances in a straight line 
towards the opposite wall, where it stops. The position and attitude (converted to Euler angles) estimates from the IEKF and 
MEKF designs are plotted against the OptiTrack ground truth in Figure [2] For visualization purposes Figure [2] also provides 
an overhead view of the positions, plus the values of the 3x3 subset of the Kalman gain matrix K acting on planar position 
and heading angle error states via their corresponding errors. 
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Fig. 2. Lineal* trajectory experiment: IEKF (^—), MEKF (^— ), Ground truth 


From Figure [2] we see that the IEKF and MEKF designs perform nearly the same. In the final stationary configuration, the 
IEKF exhibits an in-plane error of (Ax, A y, Aip) = (6.2 cm, 8.5 cm, 1.6°) from the ground truth, while for the MEKF this error 
is (6.0 cm, 7.8 cm, 1.9°) — the difference being within the uncertainty of the system. The RMS of the vector of errors between 
estimated trajectories and ground truth throughout the full experiment are RMS(Ax, Ay, Atp) = (3.8 cm, 4.9 cm, 1.1°) for 
the IEKF and (3.6 cm, 4.4 cm, 1.0°) for the MEKF, again within uncertainty. In addition the Kalman gain matrix entries are 
seen to behave nearly the same. This almost-identical performance is as expected because for a linear trajectory both filters 
reduce to a linear Kalman filter. We will be considering a non-linear robot trajectory in the next experiment. 

Note that the OptiTrack data reflects the unevenness of the wooden floor mentioned in Section IV-AI which causes the robot 
to sway with an order-of-magnitude amplitude of 1° in the roll and pitch axes and heave with amplitude 1 cm in the vertical 
axis. This acts as an exogenous disturbance to the system, and as previously discussed is accounted for by the additive process 
noise vectors v^ and v u . 

C. Second Experiment 

In the second experiment the robot begins stationary, then executes a circular (thus non-linear) trajectory consisting of two 
identical circles traveled in the counter-clockwise direction, obtained by setting differing constant speeds on the left and right 
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wheels. As in the previous experiment, for each of the two filters we plot the estimated system states versus the ground truth 
reported by the OptiTrack system, as well as an overhead view of the positions for visualization purposes and a 3 x 3 subset 
of the Kalman gain matrix K entries. The results are shown in Figure [3] 
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Fig. 3. Circular trajectory experiment: IEKF (^^), MEKF (**), Ground truth (—) 


Figure [^clearly shows that the performance of the two estimators is no longer identical, and that the IEKF-computed estimates 
are closer to the ground truth than the MEKF ones. In the final configuration, the IEKF has an error of (Ax, Ay, A i/j) = 
(14.3 cm, 1.2 cm, 9.7°) while the MEKF has (29.4 cm, 5.7 cm, 21.3°). For RMS values of the estimation discrepancy vectors, 
the IEKF shows RMS(Ax, Ay, Aip) = (10.6 cm, 14.2 cm, 5.7°) and the MEKF (18.5 cm, 21.3 cm, 12.4°). Thus in this non¬ 
linear trajectory case, the IEKF exhibits an estimation performance advantage over the MEKF. 

The Kalman gain matrix entries plotted in Figure [3] demonstrate a feature of the IEKF already discussed in Section IIV-BI 
the independence of the linearized Kalman Filter dynamics (as the matrices ( A,B,C,D ) used to compute P hence K) on 
the estimated system state R, c.f. (Ti~8l) for IEKF versus (fTil for the MEKF. Indeed throughout the circular trajectory where R 
varies with time, the IEKF gains remain approximately level while the MEKF gains oscillate — note that they can not remain 
absolutely level, as the IEKF gains depend on the scan matching covariance R v which varies throughout the trajectory due to 
inhomogeneities in the environment. This is logical as the gains should definitely adapt to the environment’s observability, such 
that the filter does not trust the scan matching output when the environment is underconstrained and contains no information 
along some specific direction(s). The IEKF design’s greater independence from estimated states is expected to provide better 
estimation accuracy than in the MEKF design, and this is indeed what we see here. 

VI. Conclusions 

We have successfully designed and experimentally validated a Kinect depth camera scan matching-aided Invariant EKF-based 
localization system and compared it against a Multiplicative EKF-based design in terms of theoretical features and estimation 
performance relative to a ground truth. The fundamental advantage of the IEKF is its guaranteed increase in robustness to poor 
estimates of state, a fundamental weakness of the MEKF design. 

We described both filter designs and tested them experimentally, demonstrating that the state estimates follow the ground 
truth and confirming the consistency of the novel ICP covariance estimation (fl~4l) derived in Section llll-CI Experimental testing 
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illustrated the theoretical features and advantages of the IEKF and demonstrated its improved estimation accuracy over the 
MEKF design for a circular (non-linear) robot trajectory. 

Future work involves mixing the introduced techniques with pose-SLAM and smoothing in order to take advantage of the 
loop closures which may occur during the motion, tackling the case where a 3D map is not available, and applying the IEKF 
to more complicated problems such as outdoor mobile cartography. 
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